Search CORE

6 research outputs found

Inferring Genomic Sequences

Author: Astrovskaya Irina A
Publication venue: ScholarWorks @ Georgia State University
Publication date: 01/01/2011
Field of study

Recent advances in next generation sequencing have provided unprecedented opportunities for high-throughput genomic research, inexpensively producing millions of genomic sequences in a single run. Analysis of massive volumes of data results in a more accurate picture of the genome complexity and requires adequate bioinformatics support. We explore computational challenges of applying next generation sequencing to particular applications, focusing on the problem of reconstructing viral quasispecies spectrum from pyrosequencing shotgun reads and problem of inferring informative single nucleotide polymorphisms (SNPs), statistically covering genetic variation of a genome region in genome-wide association studies. The genomic diversity of viral quasispecies is a subject of a great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software cannot be used to simultaneously assemble and estimate the abundance of multiple closely related (but non-identical) quasispecies sequences. Here, we introduce a new Viral Spectrum Assembler (ViSpA) for inferring quasispecies spectrum and compare it with the state-of-the-art ShoRAH tool on both synthetic and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. While ShoRAH has an advanced error correction algorithm, ViSpA is better at quasispecies assembling, producing more accurate reconstruction of a viral population. We also foresee ViSpA application to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations. Due to the large data volume in genome-wide association studies, it is desirable to find a small subset of SNPs (tags) that covers the genetic variation of the entire set. We explore the trade-off between the number of tags used per non-tagged SNP and possible overfitting and propose an efficient 2LR-Tagging heuristic

CiteSeerX

ScholarWorks @ Georgia State University

Individual-specific changes in the human gut microbiota after challenge with enterotoxigenic Escherichia coli and subsequent ciprofloxacin treatment

Author: Astrovskaya Irina
Chakraborty Subhra
Corrada Bravo Héctor
Harro Clayton
Li Shan
Lindsay Brianna R.
Parkhill Julian
Paulson Joseph N.
Pop Mihai
Sack David A.
Stine O. Colin
Walker Alan W.
Walker Richard I.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/06/2016
Field of study

Acknowledgements The authors wish to thank Mark Stares, Richard Rance, and other members of the Wellcome Trust Sanger Institute’s 454 sequencing team for generating the 16S rRNA gene data. Lili Fox Vélez provided editorial support. Funding IA, JNP, and MP were partly supported by the NIH, grants R01-AI-100947 to MP, and R21-GM-107683 to Matthias Chung, subcontract to MP. JNP was partly supported by an NSF graduate fellowship number DGE750616. IA, JNP, BRL, OCS and MP were supported in part by the Bill and Melinda Gates Foundation, award number 42917 to OCS. JP and AWW received core funding support from The Wellcome Trust (grant number 098051). AWW, and the Rowett Institute of Nutrition and Health, University of Aberdeen, receive core funding support from the Scottish Government Rural and Environmental Science and Analysis Service (RESAS).Peer reviewedPublisher PD

Aberdeen University Research

PubMed Central

FigShare

Inferring viral quasispecies spectra from 454 pyrosequencing reads

Author: A Sundquist
Alex Zelikovsky
AR Quinlan
B Gaschen
Bassam Tork
D Brinza
DC Douek
E Domingo
E Martinez-Salas
EA Duarte
G Myers
H Fakhrai-Rad
Ion Măndoiu
Irina Astrovskaya
JC de la Torre
JC Venter
JI Esteban
JJ Holland
JW Drake
K Westbrooks
Kelly Westbrooks
M Eigen
M Margulies
MC Prosperi
MJ Chaisson
N Beerenwinkel
N Eriksson
NM Laird
O Zagordi
O Zagordi
Peter Balfe
R Lippert
S Balser
S Hoffmann
S-Y Rhee
Serghei Mangul
SL Fishman
ST O’Neil
T von Hahn
V Bansal
W Brockman
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies is a subject of great interest, particularly for chronic infections, since it can lead to resistance to existing therapies. High-throughput sequencing is a promising approach to characterizing viral diversity, but unfortunately standard assembly software was originally designed for single genome assembly and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences. Results In this paper, we introduce a new Viral Spectrum Assembler (ViSpA) method for quasispecies spectrum reconstruction and compare it with the state-of-the-art ShoRAH tool on both simulated and real 454 pyrosequencing shotgun reads from HCV and HIV quasispecies. Experimental results show that ViSpA outperforms ShoRAH on simulated error-free reads, correctly assembling 10 out of 10 quasispecies and 29 sequences out of 40 quasispecies. While ShoRAH has a significant advantage over ViSpA on reads simulated with sequencing errors due to its advanced error correction algorithm, ViSpA is better at assembling the simulated reads after they have been corrected by ShoRAH. ViSpA also outperforms ShoRAH on real 454 reads. Indeed, 7 most frequent sequences reconstructed by ViSpA from a real HCV dataset are viable (do not contain internal stop codons), and the most frequent sequence was within 1% of the actual open reading frame obtained by cloning and Sanger sequencing. In contrast, only one of the sequences reconstructed by ShoRAH is viable. On a real HIV dataset, ShoRAH correctly inferred only 2 quasispecies sequences with at most 4 mismatches whereas ViSpA correctly reconstructed 5 quasispecies with at most 2 mismatches, and 2 out of 5 sequences were inferred without any mismatches. ViSpA source code is available at <url>http://alla.cs.gsu.edu/~software/VISPA/vispa.html</url>. Conclusions ViSpA enables accurate viral quasispecies spectrum reconstruction from 454 pyrosequencing reads. We are currently exploring extensions applicable to the analysis of high-throughput sequencing data from bacterial metagenomic samples and ecological samples of eukaryote populations.</p

Crossref

ScholarWorks @ Georgia State University

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

Exome sequencing of 457 autism families recruited online provides evidence for autism risk genes

Author: Astrovskaya Irina
Barnard Rebecca
Brueggeman Leo
Chung Wendy K
Eichler Evan E
Feliciano Pamela
Gibbs Richard A
Hsieh Alexander
Michaelson Jacob J
Muzny Donna M
O’Roak Brian J
Sabo Aniko
Shen Yufeng
Snyder LeeAnne Green
Turner Tychele N
Volfovsky Natalia
Wang Tianyun
Zhou Xueya
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

Autism spectrum disorder (ASD) is a genetically heterogeneous condition, caused by a combination of rare de novo and inherited variants as well as common variants in at least several hundred genes. However, significantly larger sample sizes are needed to identify the complete set of genetic risk factors. We conducted a pilot study for SPARK (SPARKForAutism.org) of 457 families with ASD, all consented online. Whole exome sequencing (WES) and genotyping data were generated for each family using DNA from saliva. We identified variants in genes and loci that are clinically recognized causes or significant contributors to ASD in 10.4% of families without previous genetic findings. In addition, we identified variants that are possibly associated with ASD in an additional 3.4% of families. A meta-analysis using the TADA framework at a false discovery rate (FDR) of 0.1 provides statistical support for 26 ASD risk genes. While most of these genes are already known ASD risk genes, BRSK2 has the strongest statistical support and reaches genome-wide significance as a risk gene for ASD (p-value = 2.3e-06). Future studies leveraging the thousands of individuals with ASD who have enrolled in SPARK are likely to further clarify the genetic risk factors associated with ASD as well as allow accelerate ASD research that incorporates genetic etiology

eScholarship - University of California

Individual-specific changes in the human gut microbiota after challenge with enterotoxigenic Escherichia coli and subsequent ciprofloxacin treatment

Author: A Kassinen
AH Havelaar
AL Bourgeois
AL Servin
Alan W. Walker
AM Svennerholm
AW Walker
B Lindsay
B Stecher
BR Lindsay
Brianna R. Lindsay
C Harro
C Jernberg
C Jernberg
C Ubeda
CA Warren
CG Buffie
CL Maynard
Clayton Harro
David A. Sack
EC Wick
F Backhed
F Qadri
GD Wu
HE Jakobsson
Héctor Corrada Bravo
Irina Astrovskaya
JL Round
JN Paulson
Joseph N. Paulson
JR Cole
Julian Parkhill
KM Keeney
L Beaugerie
L Dethlefsen
L Dethlefsen
LV Hooper
M Candela
M Ghodsi
M Li
M Pop
MH Wilcox
Mihai Pop
MJ Claesson
MU Rashid
O Koren
O. Colin Stine
RC Edgar
Richard I. Walker
S Macfarlane
SD Nyberg
SE McGarr
Shan Li
SJ O’Keefe
SK Mazmanian
SR Gill
Subhra Chakraborty
TD Lawley
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Recommended from our members

Integrated gene analyses of de novo variants from 46,612 trios with autism and developmental disorders

Author: Abbeduto Leonard
Acampado John
Ace Andrea J
Ahmed Kelli L Baalman
Albright Charles
Alessandri Michael
Amaral David G
Amatya Alpha
Annett Robert D
Arriaga Ivette
Astrovskaya Irina
Bahl Ethan
Bakken Trygve E
Barnard Rebecca A
Bashar Asif
Bernier Raphael A
Bradley Catherine C
Brooks Elizabeth
Brown Melissa
Brueggeman Leo
Butler Martin E
Butter Eric M
Camba Alexies
Carpenter Laura A
Cartner Lindsey A
Chin Wubin
Chung Wendy K
Consortium The SPARK
Coury Daniel Lee
Daniels Amy M
David Giancarla
Dennis Brandy
Eichler Evan E
Eichler Evan E
Eldred Sara
Feliciano Pamela
Fleisch Chris
Fox Emily A
Ganesan
Gerdts Jennifer A
Gibbs Richard A
Gilissen Christian
Gillentine Madelyn A
Gulsrud Amanda C
Gutierrez Anibal
Gwynette Frampton
Hale Melissa N
Haley Monica
Hanna Nathan
Henning Barbara
Herbert Lynette M
Higgins Lorrin
Hilscher Brittani A
Hsieh Alexander
Jensen William
Jones Mark
Kim Chang N
Koomar Tanner
Lash Alex E
Li Deana
Manning Patricia
Mao Yafei
Marini Richard
McCracken James T
Michaelson Jacob J
Muzny Donna
Myers Vincent J
Nowakowski Tomasz J
O'Connor Eirene
O'Roak Brian J
Pifher Taylor
Rigby Chris
Robertson Beverly E
Roby Erin
Sabo Aniko
Sandhu Sophia
Sarver Dustin E
Scherr Jessica
Schneider Hoa Lam
Shaffer Rebecca
Shah Neelay
Shah Swapnil
Shen Yufeng
Siegel Matthew
Singer Emily
Smith Kaitlin
Snyder LeeAnne G
Stephens Alexandra N
Tafolla Maira
Thomas Carrie
Thomas Taylor R
Thompson Samantha
Tjernagel Jennifer
Turner Tychele N
Vernoia Brianna M
Volfovsky Natalia
Wang Tianyun
White Loran Casey
Yang Wha S
Zhou Xueya
Publication venue: eScholarship, University of California
Publication date: 15/11/2022
Field of study

Most genetic studies consider autism spectrum disorder (ASD) and developmental disorder (DD) separately despite overwhelming comorbidity and shared genetic etiology. Here, we analyzed de novo variants (DNVs) from 15,560 ASD (6,557 from SPARK) and 31,052 DD trios independently and also combined as broader neurodevelopmental disorders (NDDs) using three models. We identify 615 NDD candidate genes (false discovery rate [FDR] < 0.05) supported by ≥1 models, including 138 reaching Bonferroni exome-wide significance (P < 3.64e-7) in all models. The genes group into five functional networks associating with different brain developmental lineages based on single-cell nuclei transcriptomic data. We find no evidence for ASD-specific genes in contrast to 18 genes significantly enriched for DD. There are 53 genes that show mutational bias, including enrichments for missense (n = 41) or truncating (n = 12) DNVs. We also find 10 genes with evidence of male- or female-bias enrichment, including 4 X chromosome genes with significant female burden (DDX3X, MECP2, WDR45, and HDAC8). This large-scale integrative analysis identifies candidates and functional subsets of NDD genes

eScholarship - University of California